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Abstract: The challenging and major role of the doctor 
in human life is to predict as well as diagnose the 
disease which has got infected in the human body. This 
typical genomic framework on disease analysis 
algorithm is designed to store and drive each and 
every gene characteristics like shape, weight, location 
and normal growth culture. Whenever the disease 
report is feed into this data mining algorithm triggers 
the similarity test built upon the data mining 
classification rules. A gene is usually comprised of 
hundreds of individual nucleotides arranged in a 
particular order. There are almost an unlimited 
number of ways that the nucleotides can be ordered 
and sequenced to form distinct genes. The algorithm 
delivers the difference between diseased and healthy 
status shall guide us to conclude the disease severity, 
stage and its nature. This powerful Typical Genomic 
Framework on Disease Analysis (TGFDA) algorithm 
is built to deliver instant result over Very Large 
Database using density and weight based Clustering 
Algorithm. 

Keywords: Association Rules, Classification Rules, 
Very Large Database with similarity and density and 
weight based clustering analysis 

I. INTRODUCTION 

In the data mining world, lot of new techniques 
and tracking algorithm has come for our consideration. 
However, integrating both fast growing and energetic 
medical science and computer science in a single 
entity really benefits the human life cycle. The major 
emphasis given in this algorithm is to access and 
analyze the gene sets. The similarity search and 
density based Clustering Analysis has laid the corner 
stone for the building called TGFDA algorithm. Being 
an introduction to the TGFDA algorithm, this 
algorithm is developed in the format of two capsules as 
shown below. In the first phase of TGFDA, healthy 
tissue gene attributes like weight, size, location, shape 
and strength are fed into a database (Original Data) for 
the aforesaid clustering and similarity analysis. 
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Figure 1: TGFDA Four Phase Capsules 

Further indexing of VLDB based on size, creates an 
easy access to gene attributes. However owing to large 
volume of data in a data warehouse, it is quite difficult 
and time consuming task to access a data simply with 
indexing. Therefore in a second phase weight and 
density based clustering approach adopted to simplify 
the access (Boost up the accessing speed). Third phase 
entirely contributes its service with data analysis by 
similarity analysis technique based on its weight, 
location, size, shape and strength compared with 
healthy tissue status of the gene. Final phase provides 
the result after the present gene deviation status 
checked with set of diseases loaded in VLDB. 

II.SEMANTIC INTEGERATION DISTRIBUTED 
GENOME DATABASES 

Typical Genomic Framework on Disease Analysis 
(TGFDA) intakes the gene sets due to the highly 
distributed, uncontrolled generation and use of a wide 
variety of DNA data, the semantic integration of such 
heterogeneous and widely distributed genome 
databases becomes and important task of systematic 
and coordinated analysis of DNA databases. TGFDA 
designed to intake the following attributes of gene sets. 
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Figure 2:Ordered and Unordered Clustering Derivations 



The below shown details (Refer Table I) classified 
and indexed by means of weight and location of the 
gene. Disease denoted by upper side for more 
deviation in increasing fashion (Positive Values) and 
disease denoted by lower side for more decreasing 
fashion. Raw VLDB deploys a data without rigid 
format. Therefore hit count of VLDB is more to obtain 
a similar data. On the other hand it is also consume 
more time to execute a result. However if we have 
above (Refer Figure II) clustering blocks for gene sets 
shall reduce the same. 

III. OPTIMAL LINK STATE 

Linking and accessing of VLDB is a major 
cumbersome area in a real time transaction processing. 
To reduce the time consumption to found the required 
data from VLDB, the entire database is clustering into 
several pieces based on its weight and location so that 
similarity analysis can be done easily on these 
clustered data. 

For example consider a shopping complex with 
different variety of items. Kindly think over what will 
happen if single room dumping with all the items and 



there is no specific identifier. Absolutely confusion 
may arise and it will take more time to identify our 
likings. The same situation will arise in this data 
mining concept where users are the customers and raw 
VLDB are the items spread over the data warehouse. 
We remove this bottleneck by building separate rooms 
to have different variety of items separately (Cluster). 
This shall enable us to directly access our fondness 
instead of searching an entire shopping complex. 

Algorithm for clustering 

//healthy tissues 

Function VLDB_cluster 

begin 

intake (element) 

if (weight > 20 and weight < =50) then 
begin 



VLDB_cluster20( attributes) 



else 



VLDB_cluster( attributes ) 



end; 
end function 
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Function VLDB_cluster20(attributes) 

begin 

// stores gene, location, size, shape, weight and 
location 
index on weight, location 
end function 

Function VLDB _cluster50(attributes) 

begin 

//stores gene, location, size, shape, weight and 
location 
index on weight, location 
end function 

Table 1: Genome Classification Based on Weight and 
Location 
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Table 2: Genome Classification Based on Weight and 
Location (Continuation) 
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IV. DNA DATA COMPARISON 
ANALYSIS 



SIMILARITY 



Similarity Analysis plays a vital role for this 
algorithm to enter into its vision. In order to improve 
the processing rapidness, all the non-numbering 
attributes such as locations and shapes are mapping 
into corresponding number values. After an 
identification of cluster using the aforesaid algorithm, 
the similarity analysis takes over the control and 
compares healthy tissue attributes with user input. This 
process is able to identify too closer disease/s like 
comparing of photograph with individual. If the key 
attributes such as gene identification and location is 
satisfied then it scans an entire user input and checks it 
values with corresponding healthy tissue attributes for 
deviation. If any deviation found in increasing or 
decreasing fashion then it read the disease rules which 
it belongs to. It may single or multiple based on the 
above aspects. 

Algorithm for Similarity Analysis 

Function similarity Jest () 

begin 

intake (elements) 

if(weight>=l andweight<=20) then 
begin 

//bypass search operation to VLDB_cluster20 
instead of searching entire database 
for each weight :-l to n 
begin 

if ( gene_id= VLDB_cluster20.gene_id and 
location=VLDB_cluster20. location ) then 
begin 

read( disease rule); 

if (weight or size or strength > disease rule) 
then 



begin 

write (upper _disease) 
else if (weight or size or strength < 
disease rule) then 

write (lower _disease) else 
write (Combination) 
end; end; 
end; 

else if (weight >20 and weight<=50) then 
begin 

//bypass search operation to VLDB_cluster50 

instead of searching entire database 
for each weight :=1 to n 
begin 

if ( gene_id= VLDB_cluster50. gene_id and 
location = VLDB_cluster50. location ) then 
begin 

read(disease rule); 

if (weight or size or strength >disease rule) then 
begin 

//bypass search operation to VLDB_cluster50 instead of 
searching entire database 
for each weight :=1 to n 

begin if(gene_id=VLDB_cluster50.gene_id and 
location = VLDB_cluster50. location ) then 
begin 

read (disease rule); 

if (weight or size or strength > disease rule) then 
begin 

write (upper _disease) 
else if (weight or size or strength < disease rule) then 
write ( lower_disease ) 

else 

write (Combination) 
end; end; 
else 

// Similar process of VLDB_cluster 
end all the case 
end function 



SIMILARITY ANALYSIS FOR BRCA1 GENOME 
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Figure 3: Similarity Search (Normal Attributes) 
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V. CONCLUSION 

The algorithm named "Typical Genomic 
Framework on Disease Analysis (TGFDA)" satisfies 
the research area of gene and its behavioral studies. In 
view of further enhancement of the same, Similarity 
analysis with path analysis shall be implemented to 
analyze the group of genes which are reason for the 
disease/s. While the group of genes may contribute to 
a disease process, different genes may become active 
at different stages of the disease. If the sequence of 
genetic changes across the different stages of disease 
development is able to identify shall be possibly lead 
to the development of new medicines that targets to the 
different stages of diseases in a futuristic manner. 
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